Bernoulli 16(3), 2010, 759-779 
DOI: 10.3150/09-BEJ232 



Asymptotic properties of maximum likelihood 
estimators in models with multiple change 
points 

HEPING HEi and THOMAS A. SEVERING 

^Department of Mathematics, University of Kansas, 14-60 Jayhawk Blvd., Lawrence, KS 66045, 
USA. E-mail: hhe@math.ku.edu 

^Department of Statistics, Northwestern University, Evanston, IL 60208, USA. 
E-mail: severini@northwestern. edu 

Models with multiple change points are used in many fields; however, the theoretical properties 
of ma:ximum likelihood estimators of such models have received relatively little attention. The 
goal of this paper is to establish the asymptotic properties of maximum likelihood estimators 
of the parameters of a multiple change-point model for a general class of models in which the 
form of the distribution can change from segment to segment and in which, possibly, there are 
parameters that are common to all segments. Consistency of the maximum likelihood estimators 
of the change points is established and the rate of convergence is determined; the asymptotic 
distribution of the maximum likelihood estimators of the parameters of the within-segment 
distributions is also derived. Since the approach used in single change-point models is not easily 
extended to multiple change-point models, these results require the introduction of those tools 
for analyzing the likelihood function in a multiple change-point model. 

Keywords: change-point fraction; common parameter; consistency; convergence rate; 
Kullback-Leibler distance; within-segment parameter 

1. Introduction 

A change-point model for a sequence of independent random variables Xi , . . . , Xn is 
a model in which there exist unknown change points ni, . . . ,71^, = tiq < ni < • • • < 
nk < n-fc+i = n, such that, for each j = 1, 2, . . . , fc -I- 1, Xnj_j^+i, . . . , Xn- are identically 
distributed with a distribution that depends on j. Here, we consider parametric change- 
point models in which the distribution of Xnj_i-\-i, ■ ■ ■ ,Xnj is parametric; however, the 
form of the distribution can be different for each j. Change-point models are used in 
many fields. For example, Broemeling and Tsurumi (1987) uses a multiple change-point 
model for the US demand for money; Lombard (1986) uses a multiple change-point model 
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to model the effect of sudden changes in wind direction on the flight of a projectile; Reed 
(1998) uses a multiple change-point model in the analysis of forest fire data. A number of 
authors have used multiple change-point models in the analysis of DNA sequences; see, for 
example, Braun and Muller (1998), Fu and Curnow (1990a, 1990b) and Halpern (2000). 
Many further examples are provided in the monographs Chen and Gupta (2000) and 
Csorgo and Horvath (1997). 

The goal of this paper is to establish the asymptotic properties of maximum likelihood 
estimators of the parameters of a multiple change-point model, under easily verifiable 
conditions. These results are based on the following model. Assume that the vectors in 
the data set xi,X2, ■ ■ ■ ,Xn are independently drawn from the parametric model 

/j(^°, 0°;x,), + l<i< J = 1, 2, . . . , fc + 1, 

where fj{ilP,6'^j]x) is a probability density function of a continuous distribution with 
unknown common parameter tjp for all j = 1, 2, . . . , fc -I- 1 and unknown within-segment 
parameters 6'° for each j = 1 , 2 , . . . , fc -I- 1 ; fj{ip'^,9'j;x) may have the same functional form 
for some or all of j = 1, 2, . . . , fc + 1; may be a vector; O'j may be a different vector 
parameter of different dimensions for each j = 1, 2, . . . , fc -|- 1. In this model, there are fc 
unknown change points nj*, 712, . . . , n^, where the number of change points fc is assumed 
to be known. The parameter ■0'^ is common to all segments. 

There are a number of results available on the asymptotic properties of parameter 
estimators in change-point models. See, for example, Hinkley (1970, 1972), Hinkley and 
Hinkley (1970), Battacharya (1987), Fu and Curnow (1990a, 1990b), Jandhyala and 
Fotopoulos (1999, 2001) and Hawkins (2001); the two monographs Chen and Gupta 
(2000) and Csorgo and Horvath (1997) have detailed bibliographies on this topic. 

In particular, Hinkley (1970) considers likelihood-based inference for a single change- 
point model, obtaining the asymptotic distribution of the maximum likelihood estimator 
of the change point under the assumption that the other parameters in the model are 
known. Hinkley (1970) and Hinkley (1972) argue that this asymptotic distribution is also 
valid when the parameters are unknown. 

Unfortunately, there are problems in extending the approach used in Hinkley (1970, 
1972) to the setting considered here. The method used in Hinkley (1970, 1972) is based 
on considering the relative locations of a candidate change point and the true change 
point. When there is only a single change point, there are only three possibilities: the 
candidate change point is either greater than, less than or equal to the true change point. 
However, in models with fc change points, the relative positions of the candidate change 
points and the true change points can become quite complicated and the simplicity and 
elegance of the single change point argument is lost. 

A second problem arises when extending the argument for the case in which the change 
points are the only parameters in the model to the case in which there are unknown 
within-segment parameters. The consistency argument used in the former case is ex- 
tended to the latter case using a "consistency assumption" (Hinkley (1972), Section 4.1); 
this condition is discussed in Appendix A and examples are given which show that this 
assumption is a strong one that is not generally satisfied in the class of models considered 
here. 
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There are relatively few results available on the asymptotic properties of maximum 
likelihood estimators in multiple change-point models. Thus, the present paper has done 
several things. In the general model described above, in which there is a fixed, but arbi- 
trary, number of change points, we show that the maximum likelihood estimators of the 
change points are consistent and converge to the true change points at the rate under 
relatively weak regularity conditions. As noted above, a simple extension of the approach 
used in single change-point models is not available; thus, the second thing achieved by 
this paper is the introduction of the tools necessary for analyzing the likelihood function 
in a multiple change-point model. Finally, the asymptotic distribution of the maximum 
likelihood estimators of the parameters of the within-segment distributions is derived for 
the general case described above, in which the form of the distribution can change from 
segment to segment and in which, possibly, there are parameters that are common to all 
segments. 

The paper is organized as follows. The asymptotic theory of maximum likelihood es- 
timators of a multiple change-point model is described in Section 2. Section 3 contains 
a numerical example illustrating these results and Section 4 contains some discussion of 
future research which builds on the results given in this paper. Appendix A discusses 
the "consistency assumption" used in Hinkley (1972); all technical proofs are given in 
Appendix B. 

2. Asymptotic theory 

Consider estimation of the multiple change-point model introduced in Section 1. For any 
change point configuration = tiq < ni < n2 < ■ • • < Uk < rife+i = n, the log-likelihood 
function is given by 



Estimators of all change points, all within-segment parameters and the common param- 
eter arc given by 



where = 1, 2, . . . , fc -I- 1, and '3/ are the parameter spaces of 9j, j = 1, . . . , fc + 1, and 
tp, respectively. 




j = l i=nj-i+l 



(ni,n2, . . .^fik, 61,62, ■■■,Ok+i,iij) 



arg max 

0<ni<n2<---<nfc<n;6li6e3 j"=l,2,...,fe+l;)/'6* 



Let 




for j = l,2,...,fc, 
for J = 1,2,..., /c, 
•,A°), 



762 H. He and T.A. Severini 

A = (Ai, A2, . . . , Afc); 

^ {9i,92, ■ ■ ■ , Ok+i), 

Note that A° is taken to be a constant vector as n goes to infinity. 
Define 

£(^")(^,0j)= ^ogf,i4>,9,;x,), j = l,2,...,fc + l, 

i=ftj_i+l 

£W(V',0,)= ^ log/,(V,0,;x,), j = l,2,...,A; + l, 

fe+1 "i 

£(^,^)=^ log/,(^,^,;x,), 

j=l i=nj_i+l 
fc+1 "? 

fe+1 rij 

The expected information matrix is given by 

M/OHfl^^i 

= diag(i?[-£W,^ (V, ^1); 0], £;h4^i^(V', 02); 0], . . . , i?[-4't'L. ^k+i);cl>]), 



where diag(-) denotes a diagonal block matrix whose diagonal blocks are in the bracket, 
other elements are zeros and the average expected information matrix is given by 

i{i}^,0)= lim -i{i}^,0). 

n—^OQ ft 
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The asymptotic properties of these estimators are based on the following regularity 
conditions. Other than the parts concerning change points, these conditions arc typically 
similar to those required for the consistency and asymptotic normality of maximum 
likelihood estimators of parameters in models without change points; see, for example, 
Wald (1949). Particularly, compactness of parameter spaces is a common assumption in 
the classical likelihood literature. 

These conditions are different from those required by Ferger (2001) and Doring (2007), 
who consider estimation of change points in a nonparametric setting in which nothing 
is assumed about the within-segment distributions, using a type of nonparametric M- 
estimator based on empirical processes. Thus, these authors do not require conditions 
on the within-segment likelihood functions; on the other hand, their method does not 
provide estimators of within-segment parameters. 

Assumption 2.1. It is assumed that for j — \, 2, ... ,k, fj+i{'tlj^ ,9j_^_^;x) ^ fj{'ilj^ ,9j;x) 
on a set of non-zero measure. 

This assumption guarantees that the distributions in two neighboring segments are 
different; clearly, this is required for the change points to be well defined. 

Assumption 2.2. It is assumed that: 

1. for j = 1,2, . . . ,k ->r 1, 0j and d'j are contained in Qj, where Qj is a compact sub- 
set of TV^^ ; ip and ^p^ are contained in ^ where is a compact subset of TZ'^ ; here, 
d,di, . . . , dk+i are non-negative integers; 

2. £{ip,9) is third-order continuously differ entiahle with respect to "4^,0; 

3. the expectations of the first and second order derivatives of i'^{'ip,6) with respect to 
4> exist for ip in its parameter space. 

Compactness of the parameter space is used to establish the consistency of the max- 
imum likelihood estimators of ni/n, . . . ,nk/n,9i, . . . ,9k+i,ipj see, for example, Bahadur 
(1971) for further discussion of this condition and its necessity in general models. If we 
assume further conditions on models, the compactness of the parameter space may be 
avoided. But this appears to be a substantial task for future work. Differentiability of 
the log-likelihood function is used to justify certain Taylor series expansions. Both parts 
of Assumption 2.2 are relatively weak and are essentially the same as conditions used 
in parametric models without change points; see, for example, Schervish (1995), Section 
7.3. Part 3 is very weak and is used in the proof of Theorem 2.3. 

Assumption 2.3. It is assumed that: 

1. for any j ~ 1,2, . . . ,k ~\- 1 and any integers s, t satisfying < s <t <n, 




2 



E 



max 



{log /j (V^, ; X,) - E[\og fj (V-, 9 J ; X,)] } 



<C{t-sy, 



,i=s+l 



where r <2 and C is a constant; 
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2. for any j = 1, 2, . . . , fc + 1 and any integers s, t satisfying ^ s <t <n'j , 

sl^^^max^^l^Y. {[log/,(^,ej;XO-log/,(^°,0°;X,)]-i;(V',0,;V'°,e°)}j | 
<D{t-sy, 

where v{%lj,9j]%j/' ,9'^^) is introduced in equation (2), r < 2 and D is a constant. 

Parts 1 and 2 of Assumption 2.3 arc technical requirements on the behavior of the 
log-Ukehhood function between and within segments, respectively. This condition is used 
to ensure that the information regarding the within- and between-scgment parameters 
grows quickly enough to establish consistency and asymptotic normality of the parameter 
estimators. These conditions are relatively weak; it is easy to check that they are satisfied 
by at least all distributions in the exponential family. Consider a probability density 
function of exponential family form: 



f{r],x) = h{x)c{r])expi Yw,{r])t,{x) 



It is then straightforward that the Schwarz inequality gives 



\i=s+l 



^ {\ogf{rj,X,)-E[\ogfi7j,X,)]} 

1 

rn 



< 



q=l 



^ (loghiX,) - E(loghiX,))) 



.i=s+l 



E 

9=1 



^ {t,iX,) - Eit.iX,))) 



.i=s+l 



Therefore, Part 1 of Assumption 2.3 is satisfied with r = 1 because the function Wq{rj) 
assumed to be continuous can achieve its maximum on the compact parameter space. 
Similarly, Part 2 of Assumption 2.3 is also satisfied with r = 1. 

The main results of this paper are given in the following three theorems. 



Theorem 2.1 (Consistency). Under Assumption 2.1, Part 1 of Assumption 2.2 and 
Part 1 of Assumption 2.3, A; — A^,0j — >j, 0^ and ip — 5-p -0" as n +oo, that is, A; — 



j 



Op(l) and ip ^ ip^ — Op(l), where Xi ~ fii/n for i ~ 1, 2, . . . , fc ana 



l,2,...,fc + l. 



Note that hi,i= l,2,...,fc, are not consistent (Hinkley (1970)); it is the estimators 
of the change-point fractions A,,* = 1,2, that are consistent. The consistency of 
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9j, j = 1,2, ... ,k + 1, and ip is the same as the corresponding result in classical likelihood 
theory for independent, identically distributed data. 

Theorem 2.2 (Convergence rate). Under Assumptions 2.1-2.3, we have 

lim lim P^(n||A- A°||oc >'5) =0, 

where A = (Ai, A2, . . . , A^), ||A - A°||oo = maxi<j<fe \Xj - A°|. That is, A^ - A° = Op(7i"^) 
for i = l,2,...,k. 

We now consider the asymptotic distribution of (f>, where 4)= {ip,0). 

Theorem 2.3 (Limiting distributions). Under Assumptions 2.1-2.3, 

- 0°) ^ Nd+d,+d,+-+d,+, (0, 0°)-^), 

where Nd+di+d2A hd^+i (0, ii'ip'^ 1 (^^)^^) '■^ d + di + d2 + ■ ■ ■ + dk+i- dimensional mul- 
tivariate normal distribution with mean vector zero and covariance matrix i{ip^ ,0'^)^^ . 

The proofs of Theorems 2.1-2.3 are based on the following approach. 
Define a function J by 

k+lk+l ^ „ + oo 

J = J2T.—\ / [logf,i^,Of,x)-\ogM4'°,O'^;x)]M4,°,0°;x)dx\ 

j=lt=l " J 

-. fc+1 "J 

+ -E E {log/,(^,0,;a;.)-i?[log/,(V',0,;X.)]} (1) 

j=l i=nj_i + l 
, fe+1 "3 

--E E {log/,(^°,0°;x.)~i?[log/,(^°,0°;XO]}, 

where riji is the number of observations in the set [nj-i + l,nj] H + 1,?^^'] for 

z, j = 1, 2, . . . , fc + 1. We obviously have that 

argmax 1= argmax J; 

0<ni<n2 <•••<"(: <n;eje©3,l<i</£+l;i/)e* 0<ni <n2 < ■ ■ ■<nfc <n;6lj GSj ,l<j <fe+l;)/'e'I' 

thus, the maximum likelihood estimators may be defined as the maximizers of J rather 
than as the maximizers of I. 
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Let v{ip,ef,^°,e°) be defined by 



+ 00 



lO) 



for i,j = l,2,...,fc + l. 
Note that J may be written J = Ji + J2 , where 



k+l k+1 

n 



n 

j=i i=i 



and 



, fc+i "3 



j=l i=rij_i + l 
fc+1 



E {log/,(V'°,e°;x.)-i?[log/,(^°,0°;X,)]}. 



Alternatively, we may write 

fc+l k+l 



•^2 = -EE E [log/,(V',ej;a:t)-£;(log/,(V',0,;XO)] 

j=i j=i ^tefiji 



J2 [iogM^p',e^;xt)-Ei\ogM^p'',elXt))] 

tefiji 



(2) 



(4) 



(5) 



where fiji = [rij-i + H + 

Note that Ji is a weighted sum of the negative Kullback-Leibler distances; it will be 
shown that J2 approaches as n — cx). Also, v{ip, 9j;ip'^,d^) < with equality if and only 
if fj{ip,9j;x) = fi{ip^,9'^;x) almost everywhere (KuUback and Leibler (1951)). 

Lemma 2.1 gives a bound for Ji. 

Lemma 2.1. Under Assumption 2.1 and Part 1 of Assumption 2.2, there exist two 
positive constants Ci > and C2 > such that, for any A and (p, 

Ji < - max{Ci II A - A" 1 1 00 , C2P(0, ./)° ) } , 

where || A - A"||oo = max^ \\j - A°| and 0") = max^ |u(V, ^'j; ^'")l • 

Lemma 2.2 describes between-segment properties and within-segment properties of 
this model. 
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Lemma 2.2. Under Part 1 of Assumption 2.2, the following two results follow from 
Parts 1 and 2 of Assumption 2.3 respectively: 

(I) for any j = 1, 2, . . . , fc + 1, any < mi < m2 < n and any positive number £ > 0, 
there exist a constant Aj, independent of e, and a constant r <2, such that 



Pr max 

\ ?rii<s<t<Tn2,eje6j/i/^G'I' 



{log/,(^,0,;XO - £;[log/,(V',0,;X,)]} 



i=s+l 



> e 



< A, 



(m2 - miY 



(6) 



(II) for any j = l,2,...,fc + l and any positive number e > 0, there exist a constant 
Bj, independent of e, and a constant r <2, such that 

pA max ^ {[log/,(7^,0,;X,)-log/,(7^o^0O;X,)] 

yn«_,<,s<t<«o,,Ae*,e,ee,- 



\ (n" - n" V" 



(7) 



In practical applications, it is useful to have an estimator of i^ip'^ , 6'^) . Let 



E[-l^^{'iP,e)-(j)\ E[-t^e{^,e);4>] 
E[-i^e{i^,B)-^Y' E[-leB{i^,0)-4>] 



1 



fe+i fij 

y y - 

1 



fj^{il;jj;x^)fjliipjj;xi), 



E 



E 



\+ifj{i^,Oj-x,) 
1 



\+ifj{i^,Oj-x,) 



fj^{i'^Sj;xi)f.jl^{tp,ej;xi), 



fjgU,ef,Xi)fj1u,ef,x{) 



for j = 1, 2, . . . , fc + 1. Then i^ij), 9)/n is & consistent estimator of 6'^). 



3. An example 

Consider the problem of analyzing the mineral content of a core sample, which is exten- 
sively studied in Chen and Gupta (2000), ChcrnofF (1973) and Srivastava and Worsley 
(1986). In particular, we consider the data in Chernoff (1973) on the mineral content of 
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12 minerals in a core sample measured at A'^ = 53 equally spaced points. Since some of 
the minerals have a very low assay, we follow Chen and Gupta (2000) and Srivastava and 
Worsley (1986) in analyzing only the p = 5 variables ^1,^8,^9,^10 and Z12 with the 
highest assays. Thus, we assume that {Zi,Zg,, Zg, Zip, Z12) has a 5-variate normal distri- 
bution with a within-segment mean parameter vector and a variance-covariance matrix 
that is common to all segments. The analyses of Chen and Gupta (2000), Chernoff (1973) 
and Srivastava and Worsley (1986) suggest that there are 5 change points of the mean 
vector and, hence, we make that assumption here. 

The estimates of 5 change points, within-segment parameters of mean vectors and 
common parameter of variance-covariance matrix were computed using maximum likeli- 
hood. The estimated change points are 7, 20, 24, 32 and 41, which are different from those 
estimated change points by Chen and Gupta (2000), Chernoff (1973) and Srivastava and 
Worsley (1986), and are more reasonable. This is because Chen and Gupta (2000), Cher- 
noff (1973) and Srivastava and Worsley (1986) use the binary segmentation procedures 
which detect multiple change points one by one, not simultaneously, whereas the method 
in this paper simultaneously estimates multiple change points. The estimated six within- 
segment mean vectors are in the following. They are arranged according to the order of 
from left to right. For example, the two vectors on the first line are, respectively, the first 
and second within-segment mean vectors. 

(287.14,58.57,25.71,240.00,422.86), (277.31,144.61,24.69,306.15,274.62), 
(321.25, 502.50, 150.00, 620.00, 217.50), (397.50, 635.00, 428.75, 625.00, 4.38), 
(470.00,188.89,214.44,255.56,108.89), (425.0,155.92,183.42,320.0,333.33). 

The estimated common variance-covariance matrix is 
/ 



1485.71 -966.03 569.41 -421.41 

966.03 8523.65 4649.95 5982.95 

569.41 4649.95 8767.11 4434.76 

-421.41 5982.95 4434.76 8768.49 



-590.87\ 
1054.22 ' 
736.33 
780.03 



V -590.87 1054.22 736.33 780.03 3193.37/ 



4. Discussion 

This paper establishes the consistency of maximum likelihood estimators of the param- 
eters of a general class of multiple change-point models and gives the asymptotic dis- 
tribution of the parameters of the within-segment distributions. The required regularity 
conditions are relatively weak and are generally satisfied by exponential family distribu- 
tions. 

Some important problems in the analysis of multiple change-point models were not 
considered here. One is that the asymptotic distribution of the maximum likelihood 
estimator of the vector of change points was not considered. The reason for this is that 
the methods used to determine this asymptotic distribution are quite different from the 
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methods used to establish the consistency of the maximum hkehhood estimator; see, for 
example, Hinkley (1970) for a treatment of this problem in a single change-point model. 
Thus, this is essentially a separate research topic. However, the asymptotic properties 
obtained in this paper are necessary for the establishment of the asymptotic distribution 
of the maximum likelihood estimator of the vector of change points in this model. This 
will be a subject of future work. 

Another important problem is to extend the results of this paper to the case in which 
the number of change points is not known and must be determined from the data. 
Clearly, a likelihood-based approach to this problem will require an understanding of 
the properties of maximum likelihood estimators in the model in which the number of 
change points is known. Thus, the results of the present paper can be considered as a 
first step toward the development of a likelihood-based methodology that can be used to 
determine simultaneously the number and location of the change points. This is also a 
topic of future research. 



Appendix A: The consistency assumption of Hinkley 
(1972) 

Consider a change-point model with a single change point, n^, and suppose that there 
are no common parameters in the model. In Hinkley (1972), it is shown that hi, the 
maximum likelihood estimator of n1, satisfies rii = + Op(l) under the condition 

sup {log/i(X.;0i)-log/2(X,;0^)}^-oo (A.l) 

with probability 1 as m — ?> oo , which was described as a "consistency assumption" . Note 
that the random variables in the sum X„o_|_]^, . . . , X„o_|_„ are drawn from the distribution 
with density /2. 



Suppose that 



- {log7i(x.;ei)-log/2(x,;0°)} 



m 

i=n°-|-l 



converges to 



/i(X^ 



as m — )• oo ; uniformly in 6i , where X is distributed according to the distribution with 
density f2{'',(^2)- Equation (A.l) then holds, provided that 
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note that, by properties of the Kullback-Leibler distance and Assumption 2.1, 
for each 9i. 

Thus, condition (A.l) fails whenever the distribution corresponding to the density 
/zCs ^2) is in the closure of the set of distributions corresponding to densities of the form 
in a certain sense. 

One such case occurs if /i and /2 have the same parametric form with parameters 
9i,02, respectively, satisfying 61 ^9%. For instance, suppose that the random variables 
in the first segment are normally distributed with mean Q\ and standard deviation 1 and 
the random variables in the second segment are normally distributed with mean Q\ and 
standard deviation 1. Then 



E 



sup {log/l(X,;0l)-log/2(X,;^^^)} = y(X„ 



where 



1 

TO 



is normally distributed with mean Q\ and variance 1/to. Clearly, (A.l) docs not hold in 
this case. 

A similar situation occurs when the distribution with density /2(';02) can be viewed 
as a limit of the distributions with densities fi{-;9i). For instance, suppose that /i 
is the density of a Weibull distribution with rate parameter /3 and shape parameter a, 
01 = (a, /3) , /3 7^ 1 , and /2 is the density of an exponential distribution with rate parameter 

02. 

In this appendix, we show that this is a strong assumption that is not generally satisfied 
by otherwise well-behaved models. For instance, suppose that fi and /2 have the same 
functional form and that the difference between the two distributions is due to the fact 
that e^l^el. Again, (A.l) will not hold. 

Thus, the consistency condition used in Hinkley (1972) is too strong for the general 
model considered here. 



Appendix B: Technical details 



Proof of Lemma 2.1. We first need to prepare some results which are to be used in 
this proof. For i = 1, 2, . . . , fc, let us define 

9i{a,(tP)= sup sup sup[aw('0,6'j;V'",6'°+i) + (l-a)f(?/',6'j;V'°,0j°)], 
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where < a < 1. We then have that ffi(0,(/)°) = =0 for i = l,2,...,fc. It is 

straightforward to show that gi{a,(ffl) is a convex function with respect to a for any 
1 = 1,2, ...,fc. 

Let Gi(0°) = (1/2,0°). Because a = 2a(l/2) + (1 - 2a)0 for < a < 1/2, convexity 
of 5i(a,0°) gives that 

g,;(a, < 2a5,(l/2, /) = aG,i4'°) for i = 1, 2, . . . , fc. 

Noting tliat 

5,(l/2,0") = i sup sup supb(^,0,;7^",0O+i) + ^^(^,^j;<,e°)], 

it fohows from Assumption 2.1 that Gi{(tfi) < 0. If we let G{(fP) = maxi<i<fc Gi{(tP), then 
G((^0) < 0. 

Let A° = mini<j<fc_i |A°_|_j — A°|. Consider a change-point fraction configuration A 
such that II A — A°||oo < ^a/^- For any j = 1, 2, . . . , fc, there are two cases: a candidate 
change-point fraction Aj may be on the left or the right of the true change-point fraction 

A?- 

For any j with \j on the right of A°, we have that Aj_i < A" < \j. Then 
If we define aj.j+i = rijj+i/ {rijj^i -\- rijj), then the case || A — A^Hoo < ^a/^ gives that 



a 



< i and 



< 



""+^-G,(0°)<(A,-A,")G(0"). 



n 



For any j with Aj on the left of A°, we have that Aj < A" < Xj+i- Similarly, we define 
ctj j-i = rij j^i/ ijij j-i -\-njj). Using the fact that oij j-i < ^, it similarly gives that 
Ji<(A,"-A,)G(0O). 

Therefore, if ||A - A°||oo < A°/4, then we obtain that Ji < ||A - AO||ooG(0"). On the 
other hand, 

Ji< min w(i/',6',;^/'°,6'")^ = - max |v(V',6',; V°,6'?)| — • 
i<i<fe+i ' ■' n i<]<k+i •' ■' n 

We have njj/n > A°/2 for any j, so 

Ji<-iA° sup |i;(V,0,;^°,0°)| = -iAV(0,0°). 
^ i<j<fe+i ^ 
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Now, consider the other case of a change-point fraction configuration A, where ||A — 
A^lloo > It is clear that there exists a pair of integers such that > nA°/4, 

ni.j+i > and Uij > JT-ij+i- Let ctij+i — nij+i/{nij+i + ). For any </>, we have 

that 

Ji < "-'^•+^^+^'^' + (1 -a.,,+i)z;(V',g.;V°,^°)] 

< ^ min(a.,,-+i, 1 - «..,-+i)G(</>°) 



< — - mm — — , — - 

- 2 \ n n 



< 



2V 2 



G(0°). 



Combining the rcsuhs from the two cases of 1 1 A — A° 1 1 oo < A^ / 4 and j | A — A" j | oo > A" / 4 , 
it fohows that 



Ji<G(</>0)min(^( ^ 



||A-A"||J<i(^)W)llA-A"|l. 



and 



2 



Ji < —A max 



AO - 



. 

< mm 

2 



A° - 

p(0,0O),-^G(^O) 



Note that (B.l) can be simphficd. If we define 



g{(j),4>)= max sup sup {v^ip, 9 j;'ip,9^)\, 
i<j<k+i e^gej -i/JG* 

then we have that p{4)^4P) / q{4)^4P) < 1. It foUows from inequahty (B.l) that 



A° 

Ji <-^f'(0,0°)niin 



— G(0 )M0,0) 



If -(AV4)G(0°)/£»(0,0°) < 1, then we have that 

Ji < (AV2)2(p(0,0°)/^?(0,0°))(G(</>O)/2). 
If -(AV4)G(^O)/f?(^,0O) > 1, then J, < -{Al/2)p{^,^°). Letting 
G2-min{(A°/2)2|G(0O)|/(2g(0,0°)),A°/2}, 
inequality (B.l) gives that Ji < —C2p{(f>,(f>°)- 



(B.l) 
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Setting Ci = {Al/2)^\G{(t>°)\/2, we finally have that 

Ji<-max{Ci||A-AO||oo,C2p(0,0°)}, 
which concludes the proof. 
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□ 



Proof of Lemma 2.2. With Part 1 of Assumption 2.3 in mind, equation (6) can be 
achieved by induction with respect to m2. The induction method is similar to the one 
used in Moricz, Scrfling and Stout (1982), so its proof is omitted here. Using Part 2 of 
Assumption 2.3, equation (7) can be proven similarly by the same induction method. □ 

Proof of Theorem 2.1. Let 

As = {XeA: ||A-A0||oo><5}, $5 = {0e$: p(0,/)>(5}, 
$ = 9i X 82 X • • • X Qk+i X "S, 
A = {(Ai,A2,...,Afc)|Aj =nj/n,j^l,2,...,k; 
< rii < ri2 < • • • < rifc < n}. 

Then, for any 5 > 0, it follows from Lemma 2.1 that 

— max Ji > CiS and — max Ji > C2S. 

Therefore, we obtain that 

Pr{\\X-X"\\oo>S) 



< Pr I max J > J < Pri max J2 > 



< Pr max > — 



\xeAs,ct>e<i> 



max 

AeA^.^g* 



Ji]<Pr( max \J2\>Ci5 



J2 {iogfM(^r,x,)-E[\ogMip,0f,x,)]} 



/fe+i 

4^1 



{log/,(V'°,0°;XO - £;[log/, W.°,0°;X, 



fc+i 



< 



Y^pA 

Ci5 



max — 

<n.ejG©3,i/'6* n 



J2 {\ogf,i^P,ef,X,)-E[\ogf,{yj,9f,X,)]} 



> 



2(fc + l) 



k+i 



^ {log/,(V'O,0O;XO - i?[log/,(^°,0°;X,)]} 



> 



Ci5 
2(fc + l) 
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It follows from Lemma 2.2 that 



Pr{\\X-\"\\oo>S)<2 



2(fc + l) 
CiS 



2 



as 71 — > +0O, 



noting that r < 2. 

For (j), we similarly obtain that 



< Pr max J > 



max — 

O<rij_i<rij<ri,eje03,'i/Je'I' 7T, 



^ {\ogf,ii;,e,;X,) - E[\ogf,ii,,9f,X,)]} 

i—7ij — i-\-l 



> 



2{k + l) 



k+i / 



J2 {log/,W'°,0°;X,) -i?[log/,(V'°,0°;X,)]} 



> 



2(fc + l) 



Similarly, Lemma 2.2 shows that Pr{p{4),(tP) > (5) — > as tt, — >■ +oo. Noting the fact that 
ej;^p°,e°) = if and only if i' = and 6*^ = 6lO, it follows that ip -^p and 9, 6° 
for j = l,2,...,fc + l, which completes the proof. □ 

Proof of Theorem 2.2. Let us first define 

A5,„ = {AeA: nj|A-A"||oo>(5} 

for any S > 0. Because of the consistency of A, we need to consider only those terms 
whose observations are in fij^^i, hj^ and "rej.j+i for all j in equation (5). Therefore, we 
have 

P,(n||A-A"||,o><5) 



< 



^PJ max \ - J2[\ogf,{i',e,;Xt)-E{logf,{^,e,;Xt))] 

--Y, [\ogf,{^°,e°;X,)-E{\ogf,{ij'>,0°;X,))] 



1 



3(fc + l) 



Ji}>0 
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fe+i 



- [log/,-i(V°,e°-i;^t)-£;(log7,-i(^°,^,"_i;X0)] 



j=2 \ ' K tG?ij,j_i 



n 



^ Ji i > 



3fc 



*Gfij,j + i 

i ^ [log /,+i(V°,0°+i;Xt)-i?(log/,+i(^", 00+1 



j=l \ t&ij,j + i 



n 



^ Ji i > 



3fc 

fc+l fc+l k 

j=l i=2 j=l 

First, consider the probability formulas Iij in the above equation for any j = 
l,2,...,fc + l. The consistency of A allows us to restrict our attention to the case 
rijj > — J^j^i)- For this case, we have that 

vP - vP 
Ji< ' 

Therefore, we obtain that 



Ivj < Pri J2 {[^ogf.ir ,9*;xt) - log f,{ij°,e^;xt)] 



ten* . 
JJ 

n° - n° 

<P,( max V {[log/,(V',0,;Xt)-log/,(7^°,0O;Xt 
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where n*^-, ip* ^ and A* are, respectively, the maximizing values of fijj, ip, Oj and A 
obtained through the maximization. Equation (7) of Lemma 2.2 can then be applied to 
show that hj — >■ as n,S ^ oo. 

Next, consider the probability formula l2j for any j = 2, . . . , fc + 1. In this case, Xj-i < 
A°_i. We have that 



l2i < P'r max 



Pr I max 



J-^ E [io87j--i(^,^j-i;^t)-^^(iog/,-iW',e,-i;^t))] 



l2^^ and /j^'' can be handled in the same way, so we just show how to handle ^2j^- 
Only two cases have to be considered. 
If TT'^-i ~ nj-i < 6, then 



max 



[iog/,(V',0j;X,) - E{\ogf,{4,,e,-x,))] 



i=s+l 



> 



Ci5 



Equation (6) of Lemma 2.2 gives that 



If iT'^-i ~ "-j-i > ^ fo'' the other case, then Ji < —Ci{n^_i — nj^i)/n. Therefore, we 



obtain that 



j(i) < p 



max 

nj^i<s<t<n°_-^,ejeej,il>e'S' 



n'j-i-rij-i 



[log/,(V,0,;^. 



-ii;(log/,(V',^,;X,))] 



6k 



>0 



< Pr max 

\ nj_i<s<t<n°_j,ejGej,i/Je* 



i = i5 + l 
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^1/0 \ \ 



which converges to zero as n, (5 — > 0, by equation (6) of Lemma 2.2. 

Isj can be handled in the same way as l2j ■ Therefore, Theorem 2.2 is proved. 

Proof of Theorem 2.3. We first have the expansion 
The fact that £^{ip,9) = then gives that 



— ^^^(^",0")+Op(l) 



Now, consider the limit oi £^{1/;'^ ,6'^)/ ^/n. We have that 

Because of the consistency of A, we can assume that < nj < '^^+1 for j = 1, 2, . . 
It is then straightforward to obtain that 



^/n 



i=i 



E ^log./,(^°,^?;XO- E ^log/,(^°,0°;X,) 



9 



9 



90 



,0 
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X d 



1° 



d 



i—nj-\-\ 

It follows from Theorem 2.2 that 
1 



i— n j _ 1 + 1 



which converges to zero in probability as n — ^ cx) . 
Since 



it follows that 



In a similar way, we easily obtain that 



Therefore, we have that 



proving the result. 



□ 
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